11 research outputs found

    Bayesian Spatio-Temporal Modeling for Forecasting, Trend Assessment and Spatial Trend Filtering

    Get PDF
    This work develops Bayesian spatio-temporal modeling techniques specifically aimed at studying several aspects of our motivating applications, to include vector-borne disease incidence and air pollution levels. A key attribute of the proposed techniques are that they are scalable to extremely large data sets which consist of spatio-temporally oriented observations. The scalability of our modeling strategies is accomplished in two primary ways. First, through the introduction of carefully constructed latent random variables we are able to develop Markov chain Monte Carlo (MCMC) sampling algorithms that consist primarily of Gibbs steps. This leads to the fast and easy updating of the model parameters from common distributions. Second, for the spatio-temporal aspects of the models, a novel sampling strategy for Gaussian Markov random fields (GRMFs) that can be easily implemented (in parallel) within MCMC sampling algorithms is used. The performance of the proposed modeling strategies are demonstrated through extensive numerical studies and are further used to analyze vector-borne disease data measured on canines throughout the conterminous United States and PM 2.5 levels measured at weather stations throughout the Eastern United States. In particular, we begin by developing a Poisson regression model that can be used to forecast the incidence of vector-borne disease throughout a large geographic area. The proposed model accounts for spatio-temporal dependence through a vector autoregression and is fit through a Metropolis-Hastings based Markov chain Monte Carlo (MCMC) sampling algorithm. The model is used to forecast the prevalence of Lyme disease (Chapter 2) and Anaplasmosis (Chapter 3) in canines throughout the United States. As a part of these studies we also evaluate the significance of various climatic and socio-economic drivers of disease. We then present (Chapter 4) the development of the \u27chromatic sampler\u27 for GMRFs. The chromatic sampler is an MCMC sampling technique that exploits the Markov property of GMRFs to sample large groups of parameters in parallel. A greedy algorithm for finding such groups of parameters is presented. The methodology is found to be superior, in terms of computational effort, to both full block and single-site updating. For assessing spatio-temporal trends, we develop (Chapter 5) a binomial regression model with spatially varying coefficients. This model uses Gaussian predictive processes to estimate spatially varying coefficients and a conditional autoregressive structure embedded in a vector autoregression to account for spatio-temporal dependence in the data. The methodology is capable of estimating both widespread regional and small scale local trends. A data augmentation strategy is used to develop a Gibbs based MCMC sampling routine. The approach is made computationally feasible through adopting the chromatic sampler for GMRFs to sample the spatio-temporal random effects. The model is applied to a dataset consisting of 16 million test results for antibodies to Borrelia burgdoferi and used to identify several areas of the United States experiencing increasing Lyme disease risk. For nonparametric functional estimation, we develop (Chapter 6) a Bayesian multidimensional trend filter (BMTF). The BMTF is a flexible nonparameteric estimator that extends traditional one dimensional trend filtering methods to multiple dimensions. The methodology is computationally scalable to a large support space and the expense of fitting the model is nearly independent of the number of observations. The methodology involves discretizing the support space and estimating a multidimensional step function over the discretized support. Two adaptive methods of discretization which allows the data to determine the resolution of the resulting function is presented. The BMTF is then used (Chapter 7) to allow for spatially varying coefficients within a quantile regression model. A data augmentation strategy is introduced which facilitates the development of a Gibbs based MCMC sampling routine. This methodology is developed to study various meteorological drivers of high levels of PM 2.5, a particularly hazardous form of air pollution consisting of particles less than 2.5 micrometers in diameter

    Upstate system results.

    No full text
    The figure displays the results for the non-ventilated census (column 1), ventilated census (column 2), and SARS-CoV-2 positive admissions (column 3) for the Upstate system from the models fit using data from March 6th, 2020 to June 1st 2020 (row 1), July 1st 2020 (row 2), and August 1st 2020 (row 3), and November 1st, 2020 to December 1st, 2020 (row 4), January 1st 2021 (row 5), and February 1st 2021 (row 6). The red shaded regions denote 95% prediction intervals, the blue lines denote the median estimators, the black points denote the observed data used to fit the model, and the red points denote observed data from the 28 day forecast period (not used to fit the model).</p

    An illustration of the SIHVR model (top) and SEAPSHVR model (bottom).

    No full text
    Rectangles represent compartments, and arrows indicate the flow of individuals between compartments.</p

    Summary of simulation study results.

    No full text
    The table provides the posterior mean estimate, empirical bias, MSE, standard deviation, and coverage probability for 95% credible intervals, averaged over all 500 datasets.</p

    IHME comparison.

    No full text
    The figure compares the performance of the Bayesian SIHVR model to the that of the IHME model. Depicted are the median (dashed red line) and 95% prediction interval (shaded red) from the Bayesian SIHVR model and the mean (dashed blue line) and 95% uncertainty interval (shaded blue) from the IHME model for the non-ventilated census (left) and ventilated census (right). Observed data points from the 28 day forecast period (not used to fit either model) are shown in black.</p

    Midlands system results.

    No full text
    The figure displays the results for the non-ventilated census (column 1), ventilated census (column 2), and SARS-CoV-2 positive admissions (column 3) for the Midlands system from the models fit using data from March 6th, 2020 to June 1st 2020 (row 1), July 1st 2020 (row 2), and August 1st 2020 (row 3), and November 1st, 2020 to December 1st, 2020 (row 4), January 1st 2021 (row 5), and February 1st 2021 (row 6). The red shaded regions denote 95% prediction intervals, the blue lines denote the median estimators, the black points denote the observed data used to fit the model, and the red points denote observed data from the 28 day forecast period (not used to fit the model).</p

    Simulation study results.

    No full text
    The figure displays the posterior median (dark blue), true value used for data generation (light blue) and 95% prediction interval (red) for the non-ventilated census (column 1), ventilated census (column 2), and admissions (column 3). From top to bottom, the rows correspond with DGMs 1–5 with T = 118.</p

    Web appendix.

    No full text
    The web appendix contains a description of the MCMC sampling algorithm (Web Appendix A), additional details regarding the simulation study (Web Appendix B), and additional details and figures regarding estimation of reported area-level COVID-19 case incidence with the Bayesian SIHVR model (Web Appendix C). (PDF)</p

    Simulation study results.

    No full text
    The figure displays the posterior median (dark blue), true value used for data generation (light blue) and 95% prediction interval (red) for the non-ventilated census (column 1), ventilated census (column 2), and admissions (column 3). From top to bottom, the rows correspond with DGMs 1–5 with T = 57.</p
    corecore